fix(connector): Skip floating-point MAP key subfield extraction by officialasishkumar · Pull Request #27511 · prestodb/presto

officialasishkumar · 2026-04-05T06:31:10Z

Description

Skip subfield extraction for MAP subscripts whose key type is DOUBLE or REAL, because those keys cannot be represented losslessly as Subfield paths.

Apply the same guard in both Hive's SubfieldExtractor and planner-side PushdownSubfields, including the map_subset and map_filter helpers.

Motivation and Context

Fixes #27507.

Subfield can represent integer and string MAP subscripts, but floating-point MAP keys were being coerced through longValue(), which could incorrectly turn expressions like x[0.99] into x[0] during pruning and pushdown. The safe behavior is to leave those expressions unpushed.

Impact

Correctness fix for Hive and planner subfield pushdown. Queries over MAP columns keyed by DOUBLE or REAL keep working, but those specific key accesses no longer participate in lossy subfield pruning and pushdown.

Test Plan

Added TestPushdownSubfields coverage for direct MAP subscripts and map_subset/map_filter.
Ran ./mvnw -pl presto-main-base -am -Dair.check.skip-all -DskipUI -Dsurefire.failIfNoSpecifiedTests=false -Dtest=TestPushdownSubfields test.
Ran ./mvnw -pl presto-hive -am -Dair.check.skip-all -DskipUI -Dsurefire.failIfNoSpecifiedTests=false -Dtest=TestSubfieldExtractor,TestDomainTranslator test.
Added TestHiveLogicalPlanner coverage for the pushdown and pruning cases; I did not execute that class locally because the available JDK 25 runtime trips a Hadoop Subject.getSubject incompatibility during Hive query-runner initialization.

Contributor checklist

Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
Documented new properties (with its default value), SQL syntax, functions, or other functionality.
If release notes are required, they follow the release notes guidelines.
Adequate tests were added if applicable.
CI passed.
If adding new dependencies, verified they have an OpenSSF Scorecard score of 5.0 or higher (or obtained explicit TSC approval for lower scores).

Release Notes

== NO RELEASE NOTE ==

Summary by Sourcery

Prevent subfield pushdown and pruning for map accesses with floating-point keys and add regression tests across planner and Hive components.

Bug Fixes:

Avoid lossy subfield extraction for DOUBLE and REAL map keys in planner-side subfield pushdown logic.
Skip subfield extraction for DOUBLE and REAL map keys in Hive's SubfieldExtractor and domain translation.

Tests:

Add planner-level TestPushdownSubfields to cover map subscript and map_* function behavior with numeric and floating-point keys.
Extend Hive logical planner, subfield extractor, and domain translator tests to verify that floating-point map key predicates are not pushed down or translated.

Floating-point map keys cannot be represented as Subfield paths without lossy coercion to LongSubscript. Keep those expressions out of subfield extraction and pushdown instead of truncating them. Apply the guard in both Hive's SubfieldExtractor and the planner-side PushdownSubfields logic, including map_subset and map_filter helpers, and add regression coverage for the extractor, domain translation, optimizer extraction, and Hive logical planning. Fixes prestodb#27507 Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>

linux-foundation-easycla · 2026-04-05T06:31:17Z

The committers listed above are authorized under a signed CLA.

✅ login: officialasishkumar / name: Asish Kumar (e977b02)

sourcery-ai · 2026-04-05T06:31:18Z

Reviewer's Guide

Implements a correctness fix to subfield pushdown for MAPs keyed by DOUBLE or REAL by teaching both the planner and Hive subfield extraction logic to skip subfield extraction for floating‑point MAP keys, and adds focused unit and integration tests to verify no subfield pruning/pushdown occurs for these keys while preserving existing behavior for integer and varchar keys.

Sequence diagram for Hive SubfieldExtractor with floating-point MAP keys

sequenceDiagram
    actor HiveSplitReader
    participant SubfieldExtractor

    HiveSplitReader->>SubfieldExtractor: extract(expression)
    activate SubfieldExtractor
    SubfieldExtractor->>SubfieldExtractor: toSubfield(expression, functionResolution, expressionOptimizer, connectorSession)
    alt expression is MAP subscript with constant key
        SubfieldExtractor->>SubfieldExtractor: hasFloatingPointMapKey(mapType)
        alt map key type is DOUBLE or REAL
            SubfieldExtractor-->>HiveSplitReader: Optional.empty
            HiveSplitReader->>HiveSplitReader: read full MAP column (no subfield pruning)
        else map key type is integer or varchar
            SubfieldExtractor->>SubfieldExtractor: add Subfield.LongSubscript(key.longValue())
            SubfieldExtractor-->>HiveSplitReader: Optional.of(subfield)
            HiveSplitReader->>HiveSplitReader: push down subfield to storage
        end
    else expression is other supported pattern
        SubfieldExtractor-->>HiveSplitReader: Optional.of(subfield or empty)
        deactivate SubfieldExtractor
        HiveSplitReader->>HiveSplitReader: existing pruning behavior
    end

Updated class diagram for PushdownSubfields and SubfieldExtractor floating-point MAP key handling

classDiagram
    class PushdownSubfields {
        +PlanOptimizerResult optimize(PlanNode plan, Session session, TypeProvider types, ...) 
        +static Optional~List~Subfield~~~ toSubfield(RowExpression expression, FunctionResolution functionResolution, ExpressionOptimizer expressionOptimizer, ConnectorSession connectorSession, FunctionAndTypeManager functionAndTypeManager, boolean isPushdownSubfieldsForMapFunctionsEnabled)
    }

    class Rewriter {
        +PlanNode rewrite(PlanNode node, Context context, PlanRewriter planRewriter)
        +static Optional~List~Subfield~~~ toSubfield(RowExpression expression, FunctionResolution functionResolution, ExpressionOptimizer expressionOptimizer, ConnectorSession connectorSession, FunctionAndTypeManager functionAndTypeManager, boolean isPushdownSubfieldsForMapFunctionsEnabled)
        -static Optional~List~Subfield~~~ extractSubfieldsFromArray(ConstantExpression constantArray, VariableReferenceExpression mapVariable)
        -static Optional~Subfield~ extractSubfieldsFromSingleValue(ConstantExpression mapKey, VariableReferenceExpression mapVariable)
        -static boolean hasFloatingPointMapKey(Type type)
        -static boolean isFloatingPointType(Type type)
        -static NestedField nestedField(String name)
    }

    class SubfieldExtractor {
        -RowExpression expression
        -FunctionResolution functionResolution
        -ExpressionOptimizer expressionOptimizer
        -ConnectorSession connectorSession
        +Optional~Subfield~ extract(RowExpression expression)
        -static Optional~Subfield~ toSubfield(RowExpression expression, FunctionResolution functionResolution, ExpressionOptimizer expressionOptimizer, ConnectorSession connectorSession)
        -static boolean hasSubscripts(Optional~Subfield~ subfield)
        -static boolean hasFloatingPointMapKey(Type type)
        -static boolean isFloatingPointType(Type type)
    }

    class Type {
    }

    class MapType {
        +Type getKeyType()
        +Type getValueType()
    }

    class Subfield {
        class PathElement
        +List~PathElement~ getPath()
        class LongSubscript
    }

    PushdownSubfields o-- Rewriter
    SubfieldExtractor ..> Subfield
    Rewriter ..> Subfield
    Rewriter ..> MapType
    SubfieldExtractor ..> MapType
    MapType --|> Type
    SubfieldExtractor ..> Type
    Rewriter ..> Type
    Subfield ..> Subfield.PathElement
    Subfield ..> Subfield.LongSubscript

File-Level Changes

Change	Details	Files
Guard subfield extraction and pruning logic so MAPs with DOUBLE/REAL keys do not produce Subfield paths.	Expose PushdownSubfields.Rewriter.toSubfield via a package-visible static wrapper for direct testing. Teach PushdownSubfields.Rewriter.toSubfield to bail out when encountering map/array subscripts on MAP types whose key type is DOUBLE or REAL instead of coercing to long. Update helper methods that extract subfields from constant arrays and single map keys to return empty when applied to MAPs with floating‑point key types. Factor out hasFloatingPointMapKey and isFloatingPointType helpers to centralize the DOUBLE/REAL checks.	`presto-main-base/src/main/java/com/facebook/presto/sql/planner/optimizations/PushdownSubfields.java`
Align Hive-side SubfieldExtractor behavior with planner by skipping floating-point MAP key subfield extraction and adding tests.	Add hasFloatingPointMapKey and isFloatingPointType helpers mirroring planner-side logic. Short-circuit SubfieldExtractor.toSubfield for MAP subscripts when the underlying MAP key type is DOUBLE or REAL so no LongSubscript is created. Extend TestSubfieldExtractor with REAL/DOUBLE MAP-key cases that must not yield Subfield paths. Extend TestDomainTranslator to ensure predicates over DOUBLE/REAL MAP subscripts do not translate to subfield domains.	`presto-hive-common/src/main/java/com/facebook/presto/hive/SubfieldExtractor.java` `presto-hive/src/test/java/com/facebook/presto/hive/TestSubfieldExtractor.java` `presto-hive/src/test/java/com/facebook/presto/hive/TestDomainTranslator.java`
Extend planner and Hive integration tests to cover pushdown/pruning behavior for DOUBLE/REAL MAP keys and map_subset/map_filter.	Add TestPushdownSubfields covering map subscript, map_subset, and map_filter extraction, including negative cases for DOUBLE/REAL key maps and a small mapType helper. Extend TestHiveLogicalPlanner’s test tables with DOUBLE/REAL MAP columns and assert that filter pushdown uses predicateColumns without generating domain predicates for floating-point MAP keys. Add map_subset/map_filter scenarios for MAP<double,double> to assert that no subfield pushdown occurs when keys are floating point. Introduce an assertNoPushdownFilterOnSubfields helper that inspects HiveTableLayoutHandle to verify absence of domainPredicate while ensuring remainingPredicate is preserved.	`presto-main-base/src/test/java/com/facebook/presto/sql/planner/optimizations/TestPushdownSubfields.java` `presto-hive/src/test/java/com/facebook/presto/hive/TestHiveLogicalPlanner.java`

Assessment against linked issues

Issue	Objective	Addressed	Explanation
#27507	Change SubfieldExtractor so that floating-point MAP key subscripts (DOUBLE/REAL) are explicitly rejected instead of being silently truncated and used as LongSubscript, ideally with a clear user-visible error message.	❌	The PR changes SubfieldExtractor to detect floating-point MAP key types and return Optional.empty() (skipping subfield extraction) instead of constructing a LongSubscript from the numeric index. This avoids truncation but does not raise any error or user-visible rejection; queries continue to run, just without subfield pruning. The behavior is "skip optimization" rather than "reject with a clear error" as requested in the issue.
#27507	Eliminate the unsafe reliance on truncating floating-point MAP key subscripts to longs for subfield pruning/pushdown across the optimizer and Hive reader, by ensuring planner-side subfield extraction and related map_* helpers do not attempt subfield pushdown for floating-point map keys.	✅

Possibly linked issues

Reject float MAP key subscripts in SubfieldExtractor instead of silently truncating #27507: They solve the same floating-point MAP key truncation problem by disabling subfield extraction/pushdown for DOUBLE/REAL keys.

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've left some high level feedback:

The floating-point map-key checks (hasFloatingPointMapKey / isFloatingPointType) are now duplicated in both PushdownSubfields and SubfieldExtractor; consider centralizing this logic in a shared utility (or at least aligning naming and placement) to reduce divergence risk.
The new REAL-key tests use floatToRawIntBits to build REAL constants, which is a bit opaque; adding a small helper or comment explaining why the bit-cast is required would make these tests easier to maintain and less error-prone.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The floating-point map-key checks (`hasFloatingPointMapKey` / `isFloatingPointType`) are now duplicated in both `PushdownSubfields` and `SubfieldExtractor`; consider centralizing this logic in a shared utility (or at least aligning naming and placement) to reduce divergence risk.
- The new REAL-key tests use `floatToRawIntBits` to build `REAL` constants, which is a bit opaque; adding a small helper or comment explaining why the bit-cast is required would make these tests easier to maintain and less error-prone.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

officialasishkumar requested review from a team, feilong-liu and jaystarshot as code owners April 5, 2026 06:31

sourcery-ai bot reviewed Apr 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(connector): Skip floating-point MAP key subfield extraction#27511

fix(connector): Skip floating-point MAP key subfield extraction#27511
officialasishkumar wants to merge 1 commit intoprestodb:masterfrom
officialasishkumar:fix/27507-float-map-subfield-extraction

officialasishkumar commented Apr 5, 2026 •

edited by sourcery-ai bot

Loading

Uh oh!

linux-foundation-easycla bot commented Apr 5, 2026 •

edited

Loading

Uh oh!

sourcery-ai bot commented Apr 5, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

officialasishkumar commented Apr 5, 2026 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Impact

Test Plan

Contributor checklist

Release Notes

Summary by Sourcery

Uh oh!

linux-foundation-easycla bot commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sourcery-ai bot commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for Hive SubfieldExtractor with floating-point MAP keys

Updated class diagram for PushdownSubfields and SubfieldExtractor floating-point MAP key handling

File-Level Changes

Assessment against linked issues

Possibly linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

officialasishkumar commented Apr 5, 2026 •

edited by sourcery-ai bot

Loading

linux-foundation-easycla bot commented Apr 5, 2026 •

edited

Loading

sourcery-ai bot commented Apr 5, 2026 •

edited

Loading